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Abstract —  The  problem  of  anomaly  localization  in  a  resource- 
constrained  cyber  system  is  considered.  Each  anomalous  compo¬ 
nent  of  the  system  incurs  a  cost  per  unit  time  until  its  anomaly  is 
identified  and  fixed.  Different  anomalous  components  may  incur 
different  costs  depending  on  their  criticality  to  the  system.  Due 
to  resource  constraints,  only  one  component  can  be  probed  at 
each  given  time.  The  observations  from  a  probed  component  are 
realizations  drawn  from  two  different  distributions  depending  on 
whether  the  component  is  normal  or  anomalous.  The  objective 
is  a  probing  strategy  that  minimizes  the  total  expected  cost, 
incurred  by  all  the  components  during  the  detection  process, 
under  reliability  constraints.  We  consider  both  independent  and 
exclusive  models.  In  the  former,  each  component  can  be  abnormal 
with  a  certain  probability  independent  of  other  components.  In 
the  latter,  one  and  only  one  component  is  abnormal.  We  develop 
optimal  simple  index  policies  under  both  models.  The  proposed 
index  policies  apply  to  a  more  general  case  where  a  subset 
(more  than  one)  of  the  components  can  be  probed  simultaneously 
and  have  strong  performance  as  demonstrated  by  simulation 
examples. 

Index  Terms —  Anomaly  localization.  Sequential  Probabili¬ 
ty  Ratio  Test  (SPRT),  sequential  hypothesis  testing,  detection 
under  uncertainty. 

I.  Introduction 

We  consider  anomaly  localization  where  the  objective  is 
to  identify  anomalous  components  in  a  system  quickly  and 
reliably.  Consider  a  cyber  system  with  K  components.  Each 
component  may  be  in  a  normal  or  an  abnormal  state.  If 
abnormal,  component  k  incurs  a  cost  C/,.  per  unit  time  until  its 
anomaly  is  identified  and  fixed.  Due  to  resource  constraints, 
only  one  component  can  be  probed  at  a  time,  and  switching 
to  a  different  component  is  allowed  only  when  the  state  of 
the  current  component  is  declared.  The  observations  from  a 
probed  component  (say  k)  follow  distributions  fjf'1  or  fj? 1 
depending  on  whether  the  component  is  normal  or  anomalous, 
respectively.  The  objective  is  a  probing  strategy  that  dynam¬ 
ically  determines  the  order  of  the  sequential  tests  performed 
on  all  the  components  so  that  the  total  cost  incurred  to  the 
system  during  the  entire  detection  process  is  minimized  under 
reliability  constraints. 
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A.  Main  Results 

The  above  problem  presents  an  interesting  twist  to  the 
classic  sequential  hypothesis  testing  problem.  In  the  case 
when  there  is  only  one  component,  minimizing  the  cost  is 
equivalent  to  minimize  the  detection  delay,  and  the  problem  is 
reduced  to  a  classic  sequential  test  where  both  the  simple  and 
the  composite  hypothesis  cases  have  been  well  studied.  With 
multiple  components,  however,  minimizing  the  detection  delay 
of  each  component  is  no  longer  sufficient.  The  key  to  minimize 
the  total  cost  is  the  order  at  which  the  components  are  being 
tested.  It  is  intuitive  that  we  should  prioritize  components 
with  higher  costs  when  abnormal  and  components  with  higher 
prior  probabilities  for  being  abnormal.  Another  parameter  that 
plays  a  role  in  the  total  system  cost  is  the  expected  time 
in  detecting  the  state  of  a  component  which  depends  on  the 
observation  distributions  {fl°\  fj. ^}:  it  is  desirable  to  place 
components  that  require  longer  testing  time  toward  the  end  of 
the  testing  process.  The  challenge  here  is  how  to  balance  these 
key  parameters  in  the  dynamic  probing  strategy. 

We  show  in  this  paper  that  the  optimal  probing  strategy  is  an 
open-loop  policy  where  the  testing  order  can  be  predetermined, 
independent  of  the  realizations  of  each  individual  test  in  terms 
of  both  the  test  outcome  and  the  detection  time.  Furthermore, 
the  probing  order  is  given  by  a  simple  index.  Specifically, 
under  the  independent  model  where  each  component  is  ab¬ 
normal  with  probability  tt/c  independent  of  other  components, 
the  index  is  in  the  form  of  7TfcCfc/E(iVfc),  where  E (Nk)  is  the 
expected  detection  time  for  component  k.  Under  the  exclusive 
model  where  one  and  only  one  component  is  abnormal,  the 
index  is  in  the  form  of  7TfcCfc/E(./Vfc|.f/o)  where  E((Vfc|7To) 
is  the  expected  detection  time  for  component  k  under  the 
hypothesis  of  it  being  normal.  It  is  interesting  to  notice  the 
difference  in  the  indexes  for  these  two  models.  Intuitively 
speaking,  under  the  exclusive  model,  the  detection  times  of  the 
normal  components  tested  before  the  single  abnormal  one  add 
to  the  cost  incurred  by  the  abnormal  component,  while  under 
the  independent  model,  the  detection  time  of  any  component, 
normal  or  abnormal,  adds  to  the  delay  in  catching  the  next 
abnormal  component. 

The  above  simple  index  forms  of  the  probing  order  are 
optimal  for  both  the  simple  hypothesis  // 1  } fc= r  ars 

known)  and  the  composite  hypothesis  ({f[.  ,  j^.1  }£ Li  have 
unknown  parameters)  cases.  These  index  policies  also  apply 
to  the  case  where  more  than  one  component  can  be  probed 
simultaneously  and  offer  strong  performance  as  demonstrated 
by  simulation  examples.  Their  optimality  in  this  case,  however, 
remains  open. 
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B.  Applications 

In  addition  to  anomaly  detection  in  cyber  systems,  the 
above  problem  also  finds  applications  in  spectrum  scanning 
in  cognitive  radio  systems  and  event  detection  in  sensor 
networks.  In  the  following  we  give  two  specific  examples. 

Consider  a  cyber  network  consisting  of  K  components 
(which  can  be  routers,  paths,  etc.).  Due  to  resource  constraints, 
only  a  subset  of  the  components  can  be  probed  at  a  time.  An 
Intrusion  Detection  System  (IDS)  analyzes  the  traffic  over  the 
components  to  detect  Denial  of  Service  (DoS)  attacks  (such 
attacks  rely  on  overwhelming  the  component  with  useless 
traffic  that  exceeds  its  capacity  so  as  to  make  it  unavailable 
for  its  intended  use).  Let  the  cost  Ck  be  the  normal  expected 
traffic  (packets  per  unit  time)  over  component  k.  Thus,  in 
this  example  minimizing  the  total  expected  cost  minimizes 
the  total  expected  number  of  failed  packets  in  the  network 
during  DoS  attacks.  The  exclusive  model  applies  to  cases 
where  an  intrusion  to  a  subnet,  consisting  of  K  components, 
has  been  detected  and  the  probability  of  each  component  being 
compromised  is  small  (thus  with  high  probability,  there  is  only 
one  abnormal  component). 

Another  example  is  spectrum  sensing  in  cognitive  radio 
systems.  Consider  a  spectrum  consisting  of  K  orthogonal 
channels.  Accessing  an  idle  channel  leads  to  a  successful  trans¬ 
mission,  while  accessing  a  busy  channel  results  in  a  collision 
with  other  users.  A  Cognitive  Radio  (CR)  is  an  intelligent 
device  that  can  detect  and  access  idle  channels  in  the  wireless 
spectrum.  Due  to  resource  constraints,  only  a  subset  of  the 
channels  can  be  sensed  at  a  time.  Once  a  channel  is  identified 
as  idle  the  CR  transmits  over  it.  Let  Ck  be  the  achievable  rate 
over  channel  k.  Thus,  in  this  example  minimizing  the  total 
expected  cost  minimizes  the  total  expected  loss  in  data  rate 
during  the  spectrum  sensing  process. 

C.  Related  Work 

The  anomaly  localization  problem,  studied  in  this  paper, 
presents  an  interesting  twist  to  the  classic  sequential  hypoth¬ 
esis  testing  problem  which  considers  only  a  single  stochastic 
process.  Sequential  hypothesis  testing  was  pioneered  by  Wald 
[1],  Wald  derived  the  Sequential  Probability  Ratio  Test  (SPRT) 
for  a  binary  hypothesis  testing.  Under  the  simple  hypotheses 
case,  the  SPRT  is  optimal  in  terms  of  minimizing  the  expected 
sample  size  under  given  type  I  and  type  II  error  probability 
constraints.  Various  extensions  for  M-ary  hypothesis  testing 
and  testing  composite  hypotheses  were  studied  in  [2]-[8]  for 
a  single  process.  In  these  cases,  asymptotically  optimal  per¬ 
formance  can  be  obtained  as  the  error  probability  approaches 
zero. 

Differing  from  this  work,  most  of  the  existing  studies  on 
sequential  detection  over  multiple  processes  focus  on  min¬ 
imizing  the  total  detection  delay.  Sequential  detection  over 
independent  processes  have  been  considered  in  [9]— [  14] .  In 
[9],  [10],  the  problem  of  quickly  detecting  an  idle  period  over 
multiple  independent  ON/OFF  processes  was  considered.  An 
optimal  threshold  policy  was  derived  in  [10].  The  ON/OFF 
nature  of  the  processes  and  the  objective  of  minimizing  the 
total  detection  delay  make  the  problems  considered  in  [9],  [10] 


fundamentally  different  from  the  one  considered  in  this  work. 
In  [11],  the  problem  of  quickest  detection  of  idle  channels  over 
K  independent  channels  with  fixed  idle/busy  state  was  studied. 
The  objective  is  to  minimize  the  detection  delay  under  error 
constraints.  It  was  shown  that  the  optimal  policy  is  to  carry 
out  an  independent  SPRT  over  each  channel,  irrespective  of 
the  testing  order.  In  contrast  to  [1 1],  we  show  in  this  paper  that 
the  optimal  policy  in  our  model  highly  depends  on  the  testing 
order  even  when  the  processes  are  independent.  In  [12],  the 
problem  of  identifying  the  first  abnormal  sequence  among  an 
infinite  number  of  i.i.d  sequences  was  considered.  An  optimal 
cumulative  sum  (CUSUM)  test  was  established  under  this  set¬ 
ting.  The  sequential  search  problem  under  the  exclusive  model 
was  investigated  in  [15]— [18].  Optimal  policies  were  derived 
for  the  problem  of  quickest  search  over  Weiner  processes  [15]— 
[17],  It  was  shown  in  [15],  [16]  that  the  optimal  policy  is  to 
select  the  sequence  with  the  highest  posterior  probability  of 
being  the  target  at  each  given  time.  In  [17],  an  SPRT-based 
solution  was  derived,  which  is  equivalent  to  the  optimal  policy 
in  the  case  of  searching  over  Weiner  processes.  However, 
minimizing  the  total  expected  cost  in  our  model  leads  to  a 
different  problem  and  consequently  a  different  index  policy. 

The  classic  target  whereabouts  problem  is  also  a  detection 
problem  over  multiple  processes.  In  this  problem,  multiple 
locations  are  searched  to  locate  a  target.  The  problem  is 
often  considered  under  the  setting  of  fixed  sample  size  as  in 
[19]-[22].  In  [19],  [20],  [22],  searching  in  a  specific  location 
provides  a  binary-valued  measurement  regarding  the  presence 
or  absence  of  the  target.  In  [21],  Castanon  considered  the 
dynamic  search  problem  under  continuous  observations:  the 
observations  from  a  location  without  the  target  and  with  the 
target  have  distributions  /  and  <y,  respectively.  The  optimal 
policy  was  established  under  a  symmetry  assumption  that 
f{x)  =  g(b  —  x)  for  some  b. 

The  anomaly  detection  problem  can  be  considered  as  a 
special  case  of  active  hypothesis  testing  in  which  the  decision 
maker  chooses  and  dynamically  changes  its  observation  model 
among  a  set  of  observation  options.  Classic  and  more  recent 
studies  of  general  active  hypothesis  testing  problems  can  be 
found  in  [23]-[27], 

D.  Organization 

In  Section  II  we  describe  the  system  model  and  problem 
formulation.  In  Section  III  we  propose  a  two-stage  optimiza¬ 
tion  problem  that  simplifies  computation  while  preserving  op¬ 
timality.  In  Section  IV  we  derive  optimal  algorithms  under  the 
independent  and  exclusive  models  for  the  simple  hypotheses 
case.  In  Section  V  we  extend  our  results  to  the  composite 
hypothesis  case:  we  derive  asymptotically  optimal  algorithms 
under  the  independent  and  exclusive  models.  In  Section  VI 
we  provide  numerical  examples  to  illustrate  the  performance 
of  the  algorithms. 

II.  System  Model  and  Problem  Formulation 

Consider  a  cyber  system  consisting  of  K  components, 
where  every  component  may  be  in  a  normal  state  (i.e.,  healthy) 
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or  abnormal  state.  Define 

"Hi  —  {k  :  1  <  k  <  K  ,  component  k  is  abnormal}  , 

Ho  =  {k:  1  <  k  <  K  ,  component  k  is  healthy}  , 

as  the  sets  of  the  abnormal  and  healthy  components. 

We  consider  two  different  anomaly  models. 

1)  Exclusive  model:  One  and  only  one  component  is  abnor¬ 
mal  with  a  priori  probability  7 Tk,  where  7r fc  =  1- 

2)  Independent  model:  Each  component  k  is  abnormal  with 
a  priori  probability  irk  independent  of  other  components. 

Every  abnormal  component  k  incurs  a  cost  Ck  (0  <  ck  <  oo) 
per  unit  time  until  it  is  tested  and  identified.  Components 
in  a  normal  state  do  not  incur  cost.  We  focus  on  the  case 
where  only  one  component  can  be  probed  at  a  time.  The 
resulting  probing  strategies  apply  to  the  case  where  a  subset 
of  the  components  can  be  probed  simultaneously  and  their 
performance  in  this  case  are  studied  via  simulation  examples, 
given  in  Sec.  VI.  When  component  k  is  tested  at  time  t,  a 
measurement  (or  a  vector  of  measurements)  yk(t)  is  drawn 
independently  in  a  one-at-a-time  manner.  If  component  k  is 
healthy,  yk(t)  follows  distribution  /^;  if  component  k  is 
abnormal,  yk(t)  follows  distribution  f[[  1 .  We  focus  first  on 
the  simple  hypotheses  case,  where  the  distributions  1 

are  completely  known.  In  Section  V  we  extend  our  results  to 
the  composite  hypotheses  case,  where  there  is  uncertainty  in 
the  distribution  parameters. 

Let  k*(t)  denotes  the  component  index  which  is  tested  at 
time  t.  Let  y (t)  =  {k*(i),yk*(i)} ,-_1  be  the  set  of  all  the 
available  observations  (and  the  component  indices)  up  to  time 
t.  A  selection  rule  is  a  mapping  from  y(t  —  1)  to  (1, 2, ...,  K }, 
which  indicates  which  component  is  chosen  to  be  tested  at  time 
t.  A  stopping  rule  and  a  decision  rule  are  used  to  decide  when 
to  terminate  the  test  and  which  components  are  declared  as 
abnormal,  respectively. 

Remark  1:  Computing  optimal  policies  for  detection  prob¬ 
lems  involving  multiple  sequences  becomes  impractical  in 
general  as  the  number  of  sequences  or  the  sample  size  in¬ 
creases  [21],  In  [15],  [19]— [21],  restrictive  assumptions  on  the 
distributions  were  used  to  obtain  simple  optimal  index  policies. 
In  [12],  [17],  [18],  restrictive  assumptions  on  the  search  model 
were  used  to  make  the  problem  mathematically  tractable.  Here, 
we  use  similar  assumptions  on  the  search  model  to  obtain  a 
mathematically  tractable  optimization  problem. 

We  consider  the  case  where  switching  between  components 
is  allowed  only  when  the  state  of  the  current  component  is 
declared  (i.e.,  switching  without  memory).  From  a  system 
perspective,  the  advantages  of  this  scheme  are  twofold.  First, 
switching  between  components  typically  adds  a  significant 
delay  that  should  be  avoided.  Second,  the  decision  maker 
stores  observations  of  only  one  component  at  each  time.  Thus, 
this  scheme  is  applicable  to  limited-memory  systems.  For 
convenience,  we  define  tm  as  the  time  when  the  decision 
maker  starts  the  mth  test.  Let  £  (1,2 be  a 

selection  rule  that  indicates  which  component  is  probed  at 
time  trn  .  The  vector  of  selection  rules  for  the  K  components 


is  denoted  by  <fi  =  (<j>(ti), ...,  <j>(tK))-  Let  1  k(tm)  be  the 
probing  indicator  function,  where  1  k{tm)  =  1  if  component 
k  is  probed  at  time  tm  and  lfc(tm)  =  0  otherwise. 

Let  Tk  be  a  stopping  time  (or  a  stopping  rule),  which  is  the 
time  when  the  the  decision  maker  stops  taking  observations 
from  component  k  and  declares  its  state.  The  vector  of 
stopping  times  for  the  K  components  is  denoted  by  r  = 
The  random  sample  size  required  to  make  a 
decision  regarding  the  state  of  component  k  is  denoted  by 
Nk-  For  example,  if  the  decision  maker  tests  component  1 
followed  by  component  2,  then  t\  =  N\  and  72  =  N\  +  N2. 
Let  6k  £  {0, 1}  be  a  decision  rule,  which  the  decision  maker 
uses  to  declare  the  state  of  component  k  at  time  rk.  6k  =  0  if 
the  decision  maker  declares  that  component  k  is  in  a  healthy 
state  (i.e.,  Hq),  and  6k  =  1  if  the  decision  maker  declares 
that  component  k  is  in  an  abnormal  state  (i.e..  Hi).  The 
vector  of  decision  rules  for  the  K  components  is  denoted  by 
8  =  8k).  An  admissible  strategy  s  is  a  sequence  of 

K  sequential  tests  for  the  K  components  and  denoted  by  the 
tuple  s  =  (r,  8 ,  <f>). 

The  problem  is  to  find  a  strategy  s  that  minimizes  the  total 
expected  cost,  incurred  by  all  the  abnormal  components  until 
declaring  their  states,  subject  to  type  I  (false-alarm)  and  type 
II  (miss-detect)  error  constraints  for  each  component: 


s.t.  P£A<ak  ,  P£*D<Pk  Vfr=  1 . K, 


Applying  type  I  and  type  II  error  constraints  for  every 
component  was  done  in  [11]  for  the  problem  of  quickest 
spectrum  scanning  over  K  independent  channels.  In  this  case, 
the  optimal  solution  is  a  sequence  of  SPRTs  (irrespective  of 
the  testing  order)  for  the  K  channels  [11],  However,  in  this 
paper  we  show  that  minimizing  the  total  expected  cost  leads 
to  different  solutions. 

Remark  2:  Note  that  the  definition  of  (2)  does  not  include 
the  cost  due  to  missed-detection  events  of  abnormal  com¬ 
ponents.  However,  the  probability  of  missed-detection  events 
decreases  exponentially  with  the  sample  size  [1],  [24],  Since 
the  error  probability  is  typically  required  to  be  small,  (2)  well 
approximates  the  actual  loss  in  practice. 

Remark  3:  In  contrast  to  the  case  of  minimizing  the  total 
delay,  in  our  model  sampling  normal  components  after  all  the 
abnormal  components  have  been  identified  does  not  incur  cost. 
Therefore,  applying  a  sequence  of  K  sequential  tests  with  type 
I  and  type  II  error  constraints  (2)  for  every  component  is 
reasonable  for  both  independent  and  exclusive  models  (note 
that  in  our  model  the  decision  maker  is  allowed  to  declare 
more  than  one  component  as  abnormal  under  the  exclusive 
model).  From  a  system  perspective,  this  formulation  makes 
the  scheme  robust  against  mistakes  in  the  system  model  (for 
instance,  if  an  exclusive  model  is  assumed,  but  there  is  more 
than  one  abnormal  component  in  the  system). 

We  develop  optimal  and  asymptotically  optimal  algorithms 
to  solve  (2)  under  the  simple  and  composite  hypotheses 
cases,  respectively.  We  show  that  the  optimal  probing  strategy 
follows  a  simple  index  rule  and  is  predetermined  at  time  t\ 
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(i.e.,  open-loop)  under  both  the  independent  and  exclusive 
models. 

III.  Decoupling  of  Ordering  and  Sequential 
Testing 

In  this  section,  we  show  that  the  probing  order  and  the 
sequential  testing  of  each  component  can  be  decoupled.  As  a 
consequence,  the  solution  to  (2)  can  be  obtained  in  two  stages. 
At  the  first  stage,  the  problem  is  to  find  a  stopping  rule  rk 
and  a  decision  rule  6k  for  every  component  k  that  minimize 
the  expected  sample  size  given  Hi  subject  to  error  probability 
constraints: 

m^E{Nk\Hi)  ,  i  =  0,1 
s.t.  PfA  <  ak  ,  PkMD  <  pk  .  (3) 


For  the  simple  hypotheses  case,  the  solution  to  the  first-stage 
optimization  problem  (3)  is  given  by  the  SPRT  [1], 

Assume  that  component  k  is  tested  at  time  t  =  1.  Let 


Lk{n) 


niLiifW)) 

nr=i 


(4) 


be  the  Likelihood  Ratio  (LR)  between  the  two  hypotheses  for 
component  k  at  stage  n. 

Let  AklBk  ( Bk  >  1  /Ak)  be  the  boundary  values  used  by 
the  SPRT  for  component  k,  such  that  the  error  constraints  are 
satisfied.  In  the  SPRT  algorithm,  at  each  stage  rt,  the  LR  is 
compared  to  the  boundary  values  as  follows: 

•  If  Lk(n)  £  ((A/.)-1,  Bk),  continue  to  take  observations 
from  component  k. 

•  If  Lfc(n)  >  Bk,  stop  taking  observations  from  component 
k  and  declare  it  as  abnormal  (i.e.,  6k  =  1).  Clearly,  Nk  = 


•  If  Lk(n )  <  (A;,-)-1,  stop  taking  observations  from 
component  k  and  declare  it  as  normal  (i.e.,  6k  =  0). 
Clearly,  Nk  =  n. 

Remark  4:  Implementation  of  the  SPRT  requires  compu¬ 
tation  of  Ak  and  Bk  ensuring  the  constraints  on  the  error 
probability.  In  general,  the  exact  determination  of  the  boundary 
values  is  very  laborious  and  depends  on  the  observation 
distribution.  Wald’s  approximation  can  be  applied  to  simplify 
the  computation  [1]: 


Bk 


1  -Pk 


Oik 


1  ~  ak 

Pk 


(5) 


Wald’s  approximation  performs  well  for  small  ak,Pk ■  Since 
type  /  and  type  II  errors  are  typically  required  to  be  small, 
Wald’s  approximation  is  widely  used  in  practice  [1], 

At  the  second  stage,  the  problem  is  to  find  a  selection  rule 
4>  that  minimizes  the  objective  function,  given  the  solution  to 
the  K  subproblems  (3): 

inf  E  j  ^2  ckTk  |  (r*,6*)  1  (6) 

^  Ue«i  J 

where 

t*  =  (t*1,...,t*k),  6*  =  (5t,...,6*K)  (7) 


denote  the  vectors  of  stopping  times  and  decision  rules, 
respectively,  that  solve  the  K  subproblems  (3). 

The  solutions  to  the  second-stage  optimization  problem  for  the 
independent  and  exclusive  models  are  given  in  Section  IV. 

The  formulation  of  the  two-stage  optimization  problem 
allows  us  to  decompose  the  original  optimization  problem  (2) 
into  K  +  1  subproblems  (3)  and  (6).  In  subsequent  sections 
we  show  that  the  two-stage  optimization  problem  preserves 
optimality  under  both  the  independent  and  exclusive  models. 

IV.  The  Simple  Hypotheses  Case 

In  this  section  we  derive  optimal  solutions  to  both  the 
independent  and  exclusive  models  when  the  observation  distri¬ 
butions  under  both  hypotheses  are  completely  known.  Under 
the  independent  model,  the  posterior  probability  of  component 
k  being  abnormal  can  be  updated  at  time  tm+i  as  follows: 

ttfc(fm-t-l)  “  (1  Ifc(fm))  tTk(tm) 

_ 1  kjtm^kjtm)/^  (yk(Nk)) _ 

7I'fc(im)/fc1)(yfc(Arfc))  +  (1  -  TTk(tm))  f^\yk(Nk))  ’ 

(8) 

where  irk(ti)  =  nk  denotes  the  a  priori  probability  of  compo¬ 
nent  k  being  abnormal.  The  term  yk(Nk)  = 
denotes  the  Nk- size  vector  of  observations,  taken  from  com¬ 
ponent  k. 

Under  the  exclusive  model,  TTk.(tm+i)  is  given  in  (9)  at  the 
top  of  the  next  page.  Note  that  in  contrast  to  the  indepen¬ 
dent  model,  under  the  exclusive  model  the  beliefs  of  all  the 
components  are  changed  at  each  time  due  to  the  dependency 
across  components.  The  posterior  probabilities  depend  on  the 
selection  rule  and  the  collected  measurements. 

A.  Optimal  Index  Policies 

Based  on  the  solution  to  the  two-stage  optimization  prob¬ 
lem,  we  propose  Algorithms  1,2,  presented  in  Tables  I,  II, 
to  solve  (2).  In  [28],  the  problem  of  ordering  operations  (or 
components)  with  a  given  processing  time  was  considered.  It 
was  shown  that  the  optimal  selection  rule  for  the  problem  of 
minimizing  an  expected  weighted  sum  of  completion  times 
is  to  select  the  components  in  decreasing  order  of  CkfE(Nk). 
However,  the  problem  in  (6)  is  different.  First,  the  components 
may  be  normal  or  abnormal  and  the  expected  sample  size 
depends  on  the  component  state.  Second,  the  objective  is 
to  minimize  an  expected  weighted  sum  of  stopping  times 
of  abnormal  components  only.  Third,  under  the  exclusive 
model,  the  state  of  each  component  depends  on  other  com¬ 
ponents.  Furthermore,  the  original  optimization  (2)  is  also 
over  the  stopping  rules  which  control  the  expected  sample 
size.  Here,  we  derive  optimal  selection  rules  that  solve  the 
second-stage  optimization  problem  (6)  for  the  independent 
and  exclusive  models.  These  selection  rules  are  given  in  step 
1  in  Tables  I,  II  for  the  independent  and  exclusive  models, 
respectively.  Arranging  the  components  in  decreasing  order 
of  7Tfc(ti)cfc/E(Afc)  or  7Tfc(<i)cfe/E(Arfc|7To)  in  step  1  can  be 
done  in  ()(K  log  K)  time  via  sorting  algorithms.  Next,  by 
the  optimal  solution  to  (3),  a  series  of  SPRTs  is  performed 
according  to  this  order  until  all  the  components  are  tested. 
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_ (yk(Nk)) _ 

7T k(tm)fk\yk(Nk ))  +  (1  -  7 Tk(tm))  fk\yk(Nk)) 

^  (1  ^  1  Mtrn)f{°(lm){y^tm)(NHtm))) 

7r^(tm)(^rn)/0(]m)(y0(tm)(^(tm)))  +  (l  —  7T0((m)(tm))  /0(tm)(y0(tm)(-^r0(tm))) 


TABLE  I 

Algorithm  1  for  the  independent  model 

1.  arrange  the  components  in  decreasing 
order  of  7rfc(fi)cfe/E(iVfc) 

2.  for  k  =  1, K  components  do: 

3.  perform  SPRT  for  component  k, 
with  P£A  <  ak,  P£ID  <  (3k 


TABLE  II 

Algorithm  2  for  the  exclusive  model 

1.  arrange  the  components  in  decreasing 
order  of  7rfe(fi)cfc/E(./Vfe|iTo) 

2.  for  k  =  1, K  components  do: 

3.  perform  SPRT  for  component  k, 
with  P£A  <  ak,  P£ID  <  [3k 


The  index  policies,  described  in  Algorithms  1,2,  are  intu¬ 
itively  satisfying.  The  priority  of  component  k  in  terms  of 
testing  order  should  be  higher  as  the  cost  ck  increases,  or  the 
a  priori  probability  of  being  abnormal  itk(t\)  increases.  Under 
the  independent  model,  the  priority  of  component  k  in  terms 
of  testing  order  should  be  higher  as  the  expected  sample  size 
E(lVfc)  decreases  (since  E,(Nk)  contributes  to  the  cost  of  every 
component  which  is  tested  after  component  k).  On  the  other 
hand,  under  the  exclusive  model,  the  priority  of  component 
k  in  terms  of  testing  order  should  be  higher  as  E(Nk\Ho) 
decreases.  Note  that  under  the  exclusive  model,  we  take  into 
account  the  expected  sample  size  under  Ho  solely.  The  reason 
is  that  if  component  k  is  abnormal,  there  is  no  additional  cost, 
incurred  by  other  components  (since  only  one  component  is 
abnormal).  On  the  other  hand,  if  component  k  is  healthy,  then 
E(ATfc|iTo)  contributes  to  the  cost  of  the  components  which  are 
tested  after  component  k  (and  may  be  abnormal).  The  SPRT 
is  used  in  both  models  to  minimize  the  expected  sample  size 
to  reduce  the  total  cost. 

The  optimality  of  Algorithms  1,  2  is  shown  in  the  following 
theorem. 

Theorem  1:  Under  the  independent  and  exclusive  models, 
Algorithms  1,2,  respectively,  solve  the  original  optimization 
problem  (2). 

Proof:  See  Appendices  VIII-A  and  VIII-B.  ■ 

Note  that  Algorithms  1,2  use  open-loop  selection  rules  (as 


stated  in  step  1),  where  the  components  order  is  predetermined 
at  time  t\.  However,  Theorem  1  is  not  restricted  to  open- 
loop  selection  rules.  Theorem  1  shows  that  Algorithms  1,  2 
are  optimal  among  the  class  of  both  open-loop  and  closed- 
loop  selection  rules. 


B.  Computing  the  Index 

Arranging  the  components  in  decreasing  order  of 
7Tfc(fi)cfc/E(7Vfc)  or  Ttk(ti)ck/'E(Nk\H0)  requires  one 
to  compute  the  expected  sample  size  'E(Nk\Hi)  for  all 
k  =  1,2 In  general,  it  is  difficult  to  obtain  a  closed- 
form  expression  for  E(Nk\H,j).  However,  since  the  solution 
to  (3)  is  given  by  the  SPRT,  Wald’s  approximation  can  be 
applied  to  simplify  the  computation  [1].  For  every  i,j  =  0, 1, 
let 

ao) 

be  the  Kullback-Leibler  (KL)  divergence  between  the  hypothe¬ 
ses  II,  and  Hj,  where  the  expectation  is  taken  with  respect  to 

A*) 

Jk  ■ 

The  expected  sample  size  conditioned  on  each  hypothesis  is 
well  approximated  by  [1]: 


E(JVfc|i?o) 


(1  ~  «fc)  log  Ak  -  qk  log  Bk 

A-(0||1) 

(1  -  fffc)  log  -Bfc  -  (3k  log  Ak 
Ac(l||0) 


(11) 


where  Ak  =  (1  -  ak)/(3k,Bk  =  (1  -  (3k)/ak  are  the 

approximation  to  Ak,Bk,  given  in  (5). 

The  expected  sample  size  required  to  make  a  decision  regard¬ 
ing  the  state  of  component  k  is  given  by: 


E(iVfc)  =  nkE(Nk\Hi)  +  (1  -  trk)E(Nk\H0)  ,  (12) 

where  the  approximation  approaches  the  exact  expected  sam¬ 
ple  size  for  small  ak,  (3k. 


V.  The  Composite  Hypotheses  Case 

In  the  previous  section  we  focused  on  the  simple  hypothe¬ 
ses  case,  where  the  distribution  under  both  hypotheses  are 
completely  known.  For  this  case,  the  SPRT  was  applied  in 
Algorithms  1,  2  to  solve  (3).  However,  in  numerous  cases  there 
is  uncertainty  in  the  observation  distributions. 

For  example.  Consider  a  one-parameter  distribution.  Sup¬ 
pose  that  it  is  required  to  test  6k  <  9 j,0'*  against  9k  >  d£  ^  > 
.  As  discussed  in  [1],  the  SPRT  can  be  applied  to  this 
problem  by  testing  9k  =  9 j.0'1  against  9k  =  9^\  where  the 
boundary  values  are  set  such  that  the  error  constraints  are 
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satisfied  at  For  some  important  cases,  such  as  an 

exponential  family  of  distributions,  this  sequential  test  has  the 
property  that  type  /  and  type  II  errors  are  less  than  a k,  fk  f°r 
all  ek  <  6^0)  and  6k  >  respectively.  However,  while  the 
SPRT  minimizes  the  expected  sample  size  at  0k  =  Gk°\  9kX\  it 
is  highly  sub-optimal  for  other  values  of  9,  as  demonstrated  in 
Section  VI.  Therefore,  other  techniques  should  be  considered 
under  the  composite  hypotheses  case. 

Let  9k  be  a  vector  of  unknown  parameters  of  component  k. 
The  observations  {yk{i)};>\  arc  drawn  from  a  common  dis¬ 
tribution  /  (y\Gk),  Gk  £  0fc,  where  <dk  is  the  parameter  space 
of  component  k.  If  component  k  is  healthy,  then  Gk  £  e^0);if 
component  k  is  abnormal,  then  Gk  £  ( @\@^° •*).  Let  Ok°\  ©j^ 
be  disjoint  subsets  of  0;.,  where  Ik  =  0\(©[0'1  U  0^  )  ^  0 
is  an  indifference  region1.  When  Gk  £  I k.  the  detector  is 
indifferent  regarding  the  state  of  component  k.  Hence,  there 
are  no  constraints  on  the  error  probabilities  for  all  Gk  £  Ik. 
The  hypothesis  test  regarding  component  k  is  to  test 

Gk  £  against  Gk  £  0^. 

Narrowing  Ik  has  the  price  of  increasing  the  sample  size. 

Let 

Gk(n)  =arg  max  f  (yk(n)\Gk), 

Hi)  (13) 

Gk  (n)  =  arg  max  /  (; yk(n)\Gk ), 
ek&ei'> 

be  the  Maximum-Likelihood  Estimates  (MLEs)  of  the  parame- 

(i) 

ters  over  the  parameter  spaces  Sk,  Qk  at  stage  n,  respectively. 

In  contrast  to  the  SPRT  (for  the  simple  hypotheses  case), 
the  theory  of  sequential  tests  of  composite  hypotheses  does 
not  provide  optimal  performance  in  terms  of  minimizing  the 
expected  sample  size  under  given  error  constraints.  Neverthe¬ 
less,  asymptotically  optimal  performance  can  be  obtained  as 
the  error  probability  approaches  zero. 

First,  we  provide  an  overview  of  existing  sequential  tests 
for  composite  hypotheses  which  are  relevant  to  our  problem. 
Next,  we  apply  these  techniques  to  solve  (2). 


A.  Existing  Sequential  Tests  for  Composite  Hypothesis  Testing 

The  key  idea  is  to  use  the  estimated  parameters  to  perform 
a  one-sided  sequential  test  to  reject  H$  and  a  one-sided 
sequential  test  to  reject  H\.  Note  that  these  techniques  were 
introduced  for  a  single  process.  However,  in  this  paper  we 
apply  sequential  tests  for  I\  components.  Thus,  we  use  the 
subscript  k  to  denote  the  component  index. 

1 )  Sequential  Generalized  Likelihood  Ratio  Test  (SGLRT): 
We  refer  to  sequential  tests  that  use  the  Generalized  Likelihood 
Ratio  (GLR)  statistics  as  the  SGLRT. 

For  i  =  0, 1,  let 


T  (i),GLR 
Lk 


( n )  =  log 


K=if^{r)\Gk(n)) 
nr=i  f(.yk{r)\6k\n)) 


(14) 


lrThe  assumption  of  an  indifference  region  is  widely  used  in  the  theory  of 
sequential  testing  of  composite  hypotheses  to  derive  asymptotically  optimal 
performance.  Nevertheless,  in  some  cases  this  assumption  can  be  removed. 
For  more  details,  the  reader  is  referred  to  [4]. 


be  the  GLR  statistics  used  to  reject  hypothesis  Hi  at  stage  n. 
Let 

iV«  =  inf  {  n  :  L^'GLR{n)  >  }  ,  (15) 

be  the  stopping  rule  used  to  reject  hypothesis  Hi.  Bk  is  the 
boundary  value. 

For  each  component  k,  the  decision  maker  stops  the  sampling 
when  Nk  =  min  |7V^,0\  |.  If  Nk  =  N^\  component  k 

is  declared  as  abnormal  (i.e.,  H0  is  rejected).  If  Nk  =  A?;n  \ 
component  k  is  declared  as  normal  (i.e.,  1 1,  is  accepted). 

The  SGLRT  was  first  studied  by  Schwartz  [2]  for  a  one- 
parameter  exponential  family,  who  assigned  a  cost  of  c  for 
each  observation  and  a  loss  function  for  wrong  decisions. 
It  was  shown  that  setting  B k^  =  log(c_1)  asymptotically 
minimizes  the  Bayes  risk  as  c  approaches  zero.  A  refinement 
was  studied  by  Lai  [4],  [6],  who  set  a  time- varying  boundary 
value  Bk^  ~  log((?Tc)~1 ).  Lai  showed  that  for  a  multivariate 
exponential  family  this  scheme  asymptotically  minimizes  both 
the  Bayes  risk  and  the  expected  sample  size  subject  to  error 
constraints  as  c  approaches  zero  [6], 

2)  Sequential  Adaptive  Likelihood  Ratio  Test  (SALRT):  We 
refer  to  sequential  tests  that  use  the  Adaptive  Likelihood  Ratio 
(ALR)  statistics  as  the  SALRT. 

For  i  =  0, 1,  let 


4‘>'^(n)  =  logn"-/(w(r)fe(,r~1>>  (16) 

,(»)) 

be  the  ALR  statistics  used  to  reject  hypothesis  Hi  at  stage  n. 
Let 


N =  inf  in  :  L 


(■ i),ALR 


in)  >  B «}  , 


(17) 


(i) 

be  the  stopping  rule  used  to  reject  hypothesis  Hi,  where  Bk 
is  the  boundary  value. 

For  each  component  k,  the  decision  maker  stops  the  sampling 
when  Nk  =  min  Nk^  j.  If  Nk  =  Nk°\  component 

k  is  declared  as  abnormal.  If  Nk  =  \  component  k  is 

declared  as  normal. 

The  SALRT  was  first  introduced  by  Robbins  and  Siegmund 
[3]  to  design  power-one  sequential  tests.  Pavlov  used  it  to 
design  asymptotically  (as  the  error  probability  approaches 
zero)  optimal  (in  terms  of  minimizing  the  expected  sample 
size  subject  to  error  constraints)  tests  for  composite  hypothesis 
testing  of  the  multivariate  exponential  family  [5].  Tartakovsky 
established  asymptotically  optimal  performance  for  a  more 
general  multivariate  family  of  distributions  [7]. 

The  advantage  of  using  the  SALRT  is  that  setting  Bky>  = 
log  (4-,  =  log  satisfies  the  error  probability  constraints 

in  (3).  However,  such  a  simple  setting  cannot  be  applied  to 
the  SGLRT.  Thus,  implementing  the  SALRT  is  much  simpler 
than  implementing  the  SGLRT.  The  disadvantage  of  using 
the  SALRT  is  that  poor  early  estimates  (for  small  number 
of  observations)  can  never  be  revised  even  though  one  has  a 
large  number  of  observations. 
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B.  Asymptotically  Optimal  Index  Policies 


Under  the  composite  hypotheses  case,  one  should  modify 
step  3  in  Algorithms  1,  2,  given  in  Tables  I,  II  by  performing 
the  SGLRT  or  SALRT  instead  of  the  SPRT.  We  refer  to 
the  modified  algorithms  as  Algorithms  3, 4,  respectively.  In 
the  following  theorems,  we  show  that  Algorithms  3,4  are 
asymptotically  optimal  in  terms  of  minimizing  the  objective 
function  subject  to  the  error  constraints  (2)  as  the  error 
probabilities  approach  zero2.  When  deriving  asymptotics  we 
assume  that  P?A  — >  0,  PjfID  — >  0  for  all  k  such  that  the 
asymptotic  optimality  property  in  terms  of  minimizing  the 
expected  sample  size  subject  to  the  error  constraints  holds  for 
each  single  process  for  both  SGLRT  and  SALRT,  as  discussed 
in  Section  V-A. 

Theorem  2:  Consider  the  independent  model  under  the  com¬ 
posite  hypotheses  case.  Let  (r*  .  S’  .  4>*)  he  the  optimal  solution 
to  (2).  Let  ( ta 3,  <5‘43,  (f>A3)  be  the  solution  achieved  by  Algo¬ 
rithm  3.  Then,  as  P?A  — >  0,  PAID  —>  0  for  all  k,  we  obtain: 


E 


^2  ckTk\(TA3 ,  SA‘\  4>A3) 
ken  i 


~  E 


(18) 


Proof:  See  Appendix  VIII-C. 


Theorem  3:  Consider  the  exclusive  model  under  the  com¬ 


posite  hypotheses  case.  Let  (r*  ,6*,4>*)  be  the  optimal  solution 
to  (2).  Let  ( ta 4,  <5‘44,  <fAi)  be  the  solution  achieved  by  Algo¬ 
rithm  4.  Then,  as  PkA  — >  0,  P?D  — >  0  for  all  k,  we  obtain: 


E 


(19) 


Proof:  See  Appendix  VIII-D. 


C.  Computing  the  Index 

Arranging  the  components  in  decreasing  order  of 
7Tfc(fi)cfc/E(7Vfc)  or  7rfc(fi)cfe/E(Wfc|iT0)  requires  one 
to  compute  the  expected  sample  size  E{Nk\Hj)  for  all 
k  =  1, 2, ....  K .  In  general,  it  is  difficult  to  obtain  a  closed- 
form  expression  for  the  exact  value  of  EfAj. |  Ilf).  However, 
we  can  use  the  asymptotic  property  of  the  tests  to  obtain  a 
closed-form  approximation  to  E(Nk\Hi),  which  approaches 
the  exact  expected  sample  size  as  the  error  probability 
approaches  zero. 

For  every  i  =  0, 1,  let 

Ctft||A,  =  Efc(l„gMA))  (20) 

2  As  shown  in  the  proof  of  Theorems  2,  3,  the  index  policies  are  still  optimal 
in  terms  of  testing  order.  The  asymptotic  optimality  is  due  to  the  performance 
of  the  sequential  test  under  the  composite  hypothesis  case. 


be  the  KL  divergence  between  the  real  value  of  Ok  and  A, 
where  the  expectation  is  taken  with  respect  to  f(y\0k), 
and  let 


=  iQf  Dk(ek\\X)  .  (21) 

A6efc 


Let  pM  (0k)  he  a  prior  distribution  on  6k  under  hypothesis  Hi 
at  component  k.  Then,  as  I’k  A  — >  0,  P[f D  — >  0,  the  expected 
sample  size  is  given  by: 


E(iVfc|Po) 

E(JVfc|ffi) 


r  log 

Wer  D*k(0k ll©^) 


dP{0\Ok)  , 


logs® 


le.ee? ui?  D*k(9k ||©£0)) 

-dPW{dk 


dP{1\Ok)  (22) 


logP^ 


Io.gi?  D*(0k ||© 


where  if.0 ■* ,  /f1'* 


k  are  disjoint  subsets  of  Ik  and  Ik  =  l[°^LiIk 


For  all  Oi  £  iff1  we  have 


log  B? 


D*k(0k\\e?)  -  D*k(0k ||e«) 


JpgB? 


(1) 

* 

for 


i,j  =  0,1. 

The  expected  sample  size  required  to  make  a  decision  regard¬ 
ing  the  state  of  component  k  is  given  by: 


E(IVfc)  =  7rfcE(ATfc|P1)  +  (1  -  Trk)E(Nk\H0)  ,  (23) 


which  can  be  well  approximated  for  small  error  probability 
using  (22). 

Remark  5:  In  numerous  cases,  uncertainty  is  associated 
with  the  abnormal  state  solely,  where  the  distribution  under 
the  normal  state  is  completely  known.  In  these  cases,  evalu¬ 
ating  E  ( Nk )  to  implement  Algorithm  3  depends  on  the  prior 
distribution  of  9k  £  @\0{,°\  while  evaluating  E(Nk\Ho)  to 
implement  Algorithm  4  does  not. 


VI.  Numerical  Examples 

In  this  section  we  present  numerical  examples  to  illustrate 
the  performance  of  the  algorithms.  Consider  a  cyber  network 
consisting  of  K  components  (which  can  be  routers,  paths, 
etc.),  as  discussed  in  section  I-B.  Assume  that  an  intruder 
tries  to  launch  a  DoS  or  Reduction  of  Quality  (RoQ)  attacks 
by  sending  a  large  number  of  packets  to  a  component.  RoQ 
attacks  inflict  damage  on  the  component,  while  keeping  a  low 
profile  to  avoid  detection.  RoQ  attacks  do  not  cause  denial  of 
service. 

To  detect  such  attacks,  the  IDS  performs  a  traffic-based 
anomaly  detection.  It  monitors  the  traffic  at  each  component 
to  decide  whether  a  component  is  compromised.  Roughly 
speaking,  if  the  actual  arrival  rate  is  significantly  higher 
than  the  arrival  rate  under  the  normal  state,  then  the  IDS 
should  declare  that  the  component  is  in  an  abnormal  state.  A 
similar  traffic-based  detection  technique  was  proposed  in  [29] 
for  a  different  model,  considering  a  single  process  without 
switching  to  other  components.  For  each  component  k,  we 
assume  that  packets  arrive  according  to  a  Poisson  process  with 
rate  0<kl .  When  component  k  is  tested,  the  IDS  collects  an 
observation  yk(n )  £  No  every  time  unit,  which  represents 
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the  number  of  packets  that  arrived  in  the  interval  (n  —  1  ,ri). 
Assume  that  the  IDS  considers  component  k  as  normal  if 

°k  <  e^\ 

h  = 

set  Cfc  0k  .  As  discussed  in  Section  I-B,  under  this  setting 
the  optimization  problem  minimizes  the  maximal  damage  to 
the  network  in  terms  of  packet-loss. 


and  tests  9k  <  9^  against  6k  >  9^  (i.e., 
<  9k  <  9^}  is  the  indifference  region).  We 


A.  Detection  Under  Simple  Hypotheses 

We  consider  the  case  where  the  observations  follow  Pois¬ 
son  distributions  yk{n)  ~  Poi(6^)  or  t/fc(n)  ~  Poi(0^) 
depending  on  wether  component  k  is  healthy  or  abnormal, 
respectively,  where  9^\  9A 1  are  known  to  the  IDS.  To  imple¬ 
ment  Algorithms  1,2  (which  are  optimal  in  this  scenario  for 
the  independent  and  exclusive  models,  respectively),  we  need 
to  compute  the  LR  between  the  hypotheses,  defined  in  (4), 
and  the  expected  sample  sizes  under  the  hypotheses,  which 
can  be  well  approximated  by  (11).  Let  A k(n)  =  log Lfc(n)  be 
the  Log-Likelihood  Ratio  (LLR)  between  the  two  hypotheses 
of  component  k  at  stage  n,  where  Lk(n)  is  defined  in  (4). 
After  algebraic  manipulations,  it  can  be  verified  that  the  LLR 
is  given  by: 

n 

Afc(n)  =  —n  -  6^0))  -flog  yk(i)  ■  (24) 

i—1 

It  can  be  verified  that  the  KL  divergence  between  the  hypothe¬ 
ses  Hi  and  Hj ,  defined  in  (10),  is  given  by: 

Dk(i\\j)  =  9ij)  -  9f  +  9f  log  (ef/6^)  .  (25) 

Substituting  (25)  in  (11)  yields  the  required  approximation  to 
the  expected  sample  size. 

Next,  we  provide  numerical  examples  to  illustrate  the  per¬ 
formance  of  the  algorithms.  We  compared  three  schemes:  a 
Random  selection  SPRT  (R-SPRT),  where  a  series  of  SPRTs 
are  performed  until  all  the  components  are  tested  in  a  ran¬ 
dom  order  (which  is  optimal  for  the  problem  of  minimizing 
the  detection  delay  over  independent  processes  [11]),  and 
the  proposed  Algorithms  1,2,  which  are  optimal  under  the 
independent  and  exclusive  models,  respectively. 

Let  Ak  =  (100  -  10)/(AT  -  1).  We  set  ck  =  9{k]  =  10  + 
(k  —  1)A#-  (i.e.,  the  costs  are  equally  spaced  in  the  interval 
[10, 100])  and  9^  =  1.5  •  9ky> .  The  error  constraints  were  set 
to  PkA  =  10 ~2,PkID  =  10-6  for  all  k.  For  the  independent 
and  exclusive  models,  we  set  irk  =  0.8  and  7 r*,  =  1/K  for 
all  k,  respectively.  The  performance  of  Algorithms  1  and  2 
are  presented  in  Fig.  1(a)  and  1(b)  under  the  independent  and 
exclusive  models,  respectively,  and  compared  to  the  R-SPRT. 
It  can  be  seen  that  the  proposed  Algorithms  save  roughly  50% 
of  the  objective  value  as  compared  to  the  R-SPRT  under  both 
the  independent  and  exclusive  model  scenarios. 

Next,  we  simulate  the  independent  model  when  2  com¬ 
ponents  are  observed  at  a  time  and  the  total  number  of 
components  is  I\  =  6.  Note  that  in  this  case  Algorithm  1  may 
not  be  optimal.  We  use  an  exhaustive  search  as  a  bench  mark  to 
demonstrate  the  performance  of  Algorithm  1  in  this  scenario. 
The  exhaustive  search  is  done  by  performing  a  sequence  of 


(a)  An  independent  model  scenario. 


(b)  An  exclusive  model  scenario. 


Fig.  1.  Objective  value  as  a  function  of  the  number  of  components  under 
the  independent  and  exclusive  models. 


K  SPRTs  among  all  the  possible  testing  orders.  Then,  the 
minimal  objective  value  is  chosen  as  a  bench  mark.  We  set 
the  maximal  cost  to  cmax  =  100  and  the  costs  are  equally 
spaced  in  the  interval  [cmj„,  100].  The  error  constraints  were 
set  to  PkA  =  PkID  =  10-2  for  all  k.  The  performance  gain 
of  the  exhaustive  search  scheme  over  Algorithm  1  as  a  function 
of  cmin  are  presented  in  Fig.  2.  It  can  be  seen  that  Algorithm 
1  almost  achieves  the  performance  of  the  exhaustive  search 
scheme  in  this  scenario  for  all  cmin-  F°r  small  cmin  both 
algorithms  perform  the  same,  since  the  difference  between  the 
indices  increases.  The  exhaustive  search  outperforms  Algorith- 
m  1  for  Cmin  >  97,  but  the  gain  remains  very  small. 


B.  Detection  Under  Uncertainty 

We  consider  the  case  of  composite  hypotheses,  where  there 
is  uncertainty  in  the  distribution  parameters,  as  discussed  in 
Section  V.  To  implement  the  asymptotically  optimal  Algo¬ 
rithms  3,4,  we  need  to  compute  the  GLR  or  ALR  statistics, 
defined  in  (14),  (16)  and  the  expected  sample  sizes  under 
the  hypotheses,  which  can  be  well  approximated  by  (22). 
The  MLEs  of  the  parameters  over  the  parameter  spaces  0*,, 
are  given  by  the  sample  mean  and  the  boundary  of 
the  alternative  parameter  space,  respectively.  As  a  result. 
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Fig.  2.  Performance  gain  of  an  exhaustive  search  over  Algorithm  1  as  a 
function  of  cmin  under  the  independent  model. 

substituting:  8k(n)  =  £X)"=12/fc(i)  ^k\n)  =  >  in  (14X 

(16)  yields  the  GLR  and  ALR  statistics,  respectively.  The  KL 
divergence  between  the  real  value  of  9k  and  the  parameter 
space  (-)\'>  is  given  by: 

D*k(6k nejfl)  =  ef  -ek  +  ek  log  (ek/e^)  .  (26) 

Substituting  (26)  in  (22)  yields  the  approximate  expected 
sample  size. 

Next,  we  provide  numerical  examples  to  illustrate  the  per¬ 
formance  of  the  algorithms  under  uncertainty.  We  simulated 
a  network  with  homogenous  components  (i.e.,  any  selection 
rule  is  optimal).  We  compared  three  schemes:  R-SPRT,  and 
Algorithms  3  or  4  (which  achieve  the  same  performance  in  this 
case)  using  the  SALRT  and  the  SGLRT,  discussed  in  section 
V-A.  We  set  9^'  =  19,  9^  =  21.  Under  uncertainty,  the 
IDS  considers  component  k  as  normal  if  9k  <  (tk  \  and  tests 
9k  <  9(k)  against  9k  >  9k]  (i.e.,  Ik  =  {0fe|19  <  9k  <  21}  is 
the  indifference  region).  To  implement  the  SGLRT,  we  set  the 
cost  per  observation  c  =  1  0“  3.  According  to  the  assigned  cost, 
we  obtained  the  following  error  probability  constraints  for  all 
k:  P[A  <  0.026  for  all  9 W  <  19  and  P^D  <  0.03  for 
all  9ik'>  >  21.  We  do  not  restrict  the  detector’s  performance 
for  19  <  9fk>  <  21  (Note  that  narrowing  the  indifference 
region  has  the  price  of  increasing  the  required  sample  size). 
In  Fig.  3  we  show  the  average  number  of  observations  (in  a 
log  scale)  required  for  the  anomaly  detection  as  a  function 
of  9^k\  As  expected,  for  9k  =  19  and  9k  =  21  the  R- 
SPRT  requires  lower  sample  size  as  compared  to  the  proposed 
schemes.  On  the  other  hand,  it  can  be  seen  that  for  most  values 
of  9  the  SGLRT  and  the  SALRT  require  lower  sample  size  as 
compared  to  the  R-SPRT.  The  SALRT  performs  the  worst  for 
18  <  9k  <  22,  and  performs  the  best  for  0^(18,  22),  roughly. 
The  SGLRT  obtains  the  best  average  performance.  It  can  be 
seen  that  for  large  values  of  9k  the  anomaly  is  detected  very 
quickly,  since  the  distance  between  the  hypotheses  increases. 
This  result  confirms  that  DoS  attacks  are  much  easier  to  detect 
than  RoQ  attacks. 


Fig.  3.  Average  number  of  observations  as  a  function  of  the  arrival  rate  of 
packets  (denoted  by  6). 

VII.  Conclusion 

The  problem  of  anomaly  localization  in  a  resource- 
constrained  cyber  system  was  investigated.  Due  to  resource 
constraints,  only  one  component  can  be  probed  at  a  time. 
The  observations  are  realizations  drawn  from  two  different 
distributions  depending  on  whether  the  component  is  normal 
or  anomalous.  An  abnormal  component  incurs  a  cost  per  unit 
time  until  it  is  tested  and  identified.  The  problem  was  formu¬ 
lated  as  a  constrained  optimization  problem.  The  objective  is 
to  minimize  the  total  expected  cost  subject  to  error  probability 
constraints.  We  considered  two  different  anomaly  models:  the 
independent  model  in  which  each  component  can  be  abnormal 
independent  of  other  components,  and  the  exclusive  model 
in  which  there  is  one  and  only  one  abnormal  component. 
For  the  simple  hypotheses  case,  we  derived  optimal  algo¬ 
rithms  for  both  independent  and  exclusive  models.  For  the 
composite  hypotheses  case,  we  derived  asymptotically  (as  the 
error  probability  approaches  zero)  optimal  algorithms  for  both 
independent  and  exclusive  models.  These  optimal  algorithms 
have  low-complexity. 

The  algorithms  that  have  been  developed  in  this  paper  can 
be  applied  to  other  models  of  anomaly  detection  as  well.  We 
can  modify  the  proposed  algorithms  to  any  detection  scheme 
that  performs  a  series  of  tests  until  all  the  components  are 
tested.  The  required  modification  is  in  step  3  of  the  algorithms, 
where  the  SPRT/SALRT/SGLRT  are  replaced  by  any  given 
test.  Such  modified  algorithms  minimize  the  objective  function 
among  all  the  algorithms  that  perform  the  given  test. 

VIII.  Appendix 

A.  Proof  of  Theorem  1  Under  The  Exclusive  Model 

Let  Ei' (Nk\H.i,i)  be  the  expected  sample  size  achieved  by  a 
stopping  rule  and  a  decision  rule  (Tk(t),S'k(t)),  depending  on 
the  time  that  component  k  is  tested  (i.e.,  (r^.(f) ,  S'k(t))  depend 
on  the  selection  rule),  such  that  error  constraints  are  satisfied. 
Let  EA2(Nk\Hi)  be  the  expected  sample  size  achieved  by  the 
SPRT’s  stopping  rule  and  decision  rule  (tA2,SA2),  indepen¬ 
dent  of  the  time  that  component  k  is  tested  (i.e.,  ( tA2,  5A2)  are 
independent  of  the  selection  rule),  such  that  error  constraints 
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are  satisfied.  Clearly,  EA2(Nk\Hi)  <  E' (Nk\Hi,t)  for  all  k,t, 
for  i  =  0,1. 

Step  1:  Proving  the  theorem  for  K  =  2: 

Assume  that 

ni(h)ci  tt2  (f  i  )c2 

EA2(Ni\H0)  ~  EA2(N2\H0)  ' 

Consider  selection  rules  cf)' 1  \  (f)2’  that  select  component  1 
first  followed  by  component  2  and  component  2  first  followed 
by  component  1,  respectively.  The  expected  cost  achieved  by 
( T'(t),<S'(f),</>(2) )  is  given  by: 

E  (xicfcTfcl{fc6«i}  i  (T,w><J,(t)’0(2)) 

u=i 

=  (E,(7V2|i/1,f1))7T2(f1)c2 

+  (Er (N2\Ho,  t±)  +  E1  (Ni\Hi,t2))  7r1(f1)ci. 


Assume  that  the  theorem  is  true  for  K  —  1  components  (where 
one  and  only  one  component  is  abnormal).  Assume  that 


7Tl(fl)Cl  ^  tT2  (f  i  )c2  ^  ^  7 TK(tf)CK 

E^iN.lHo)  -  EA2(N2\H0)  ~  ■"  ^  EA2(NK\H0)  ‘ 

(32) 

Consider  the  case  of  K  components  and  denote  (fJI  as  an 
optimal  selection  rule  that  selects  component  j  first. 

Step  2.1:  Proving  the  theorem  for  the  last  K  —  1  components: 

Next,  we  show  that  the  last  K—  1  components  must  be  selected 
in  decreasing  order  of  nk{ti)ck/EA2(Nk\Ho)  and  tested  by 
the  SPRT. 

Let 


7j(t) 


1 


TTj  (f  ) 


/j0)(  y;W)) 


■f  1  7Tj  (t ) 


(33) 


The  expected  cost  achieved  by  (r'(f),  ^(f),  is  given  by: 

E  |X!Cferfcl{fc6«i}  I  (t'(£),  <*'(£),  </>(1))} 


.fc=l 


=  (E/(JV1|H1,t1))7r1(t1)c1 

+  (E'(lVi \Hq,  t\)  +E' (N2\Hi,t2))  Tt2(t\)c2. 


(29) 


Note  that  the  expected  cost  achieved  by  both  selection  rules 
can  be  further  reduced  by  minimizing  the  expected  sample 
sizes  (such  that  error  constraints  are  satisfied)  independent 
of  the  selection  rules,  which  is  achieved  by  (tA2,5A2). 
Therefore,  an  optimal  solution  must  be  (t*42,  <5A2,  or 
(r*42,  dA2 ,  </>(2')).  Next,  we  use  the  interchange  argument  to 
prove  the  theorem  for  K  =  2.  The  expected  cost  achieved  by 
(tA2,  SA2,rf>(2^)  is  given  by: 


K 


(30) 


E  EC^1{^«1}  I  ( TA2,8A\<t>W)\ 
lfc=i  J 

=  (EA2(N2\Hl))  TT2(h)c2 

+  (EA2(N2\H0)  +  EA2(A'1|iT1))  TTi^Od. 

The  expected  cost  achieved  by  (tA2,  SA2 ,  <f>^)  is  given  by 


K 


E { J2CkTklik^}  i  (rA2^A2></>(1)) 


Kk= 1 


=  (EA2(A'1|iT1))  (f1)c1 

+  (E^iN^Ho)  +EA2(N2\H1))  it2{h)c2. 


(31) 


Note  that  when  the  decision  maker  completes  testing  compo¬ 
nent  j,  the  other  components  update  their  beliefs  according 
to: 


7Tfc(£2)  =  Jj(tl)Ttk(tl)  ,  Vfc  ^  j  . 


(34) 


The  expected  cost  achieved  by  given  the  outcome  (at  time 
t2)  by  testing  component  j  (i.e.,  given  the  observations  vector 
y j{Nj))  is  given  by: 


K 


E  E  CfcTfcljfcew!}  | 


^k= 1 


=  7Tj(t2)CjNj  +  (1  -  7T j(t2))  X 

(  K 

E  <  E  Cfc^feiffeeWi}  I  <t>b\y G  Ho 
I  k=l,kjtj 


Let 


Tk  =  Tk  -  Nj  Vfc  ^  j 


(35) 

(36) 


be  the  modified  stopping  time,  defined  as  the  stopping  time 
from  t  =  N.j  +  1  until  testing  of  component  k  is  completed. 
Thus,  we  can  rewrite  (35)  as: 


K 


the  expected  cost  achieved  by  is  lower  than  that  achieved 

,  ,(2)  •  *i(£i)ci  Tt2(fi)c2 

by  <p  ’  since  — ,,, , - , - -  >  — — - , - ,  which  com- 

^  V  E^iN^Ho)  ~  EA2(N2\Hq) 

pletes  the  proof  for  K  -  2. 

Step  2:  Proving  the  theorem  by  induction  on  the  number  of 
components  K: 


E  \  ECfeTfelffc6«i}  1 0O)’y j(Nj) 

v  k=l 
K 

=  E  nk(h) ckNj  +  (1  -  7rj-(£2))  x 


k=l 


K 


E  {  ^2  ckrkl{keHi}  I 

k  =  l,ky^j 


(37) 
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The  term  Ylk=i  nk(t2)ckNj  in  (37)  follows  since, 


Pr  (k  G  Hi  |  0O),yi(AT,),  j  g  Ho') 

Pr^fc  G  Hi,  j  G  Ho  |  4>(j\yj(Nj),') 

Pr  (j  GH0  |  cl>{j),yj(Nj),') 
_Pr(fcGH1|^,yJ-(JVJ-),)  _ 
Pr(jGHo|0(i),y,(Ag,)  1  -  ^#(*2) 

Minimizing 


=  nk(t2)  ■ 
(38) 


E 


'  K 


CfcTfcl{fceWl}  |  <t>(3) ,yj{Nj) 


,k= i 


(39) 


at  time  t2,  requires  one  to  minimize 

E  |  53  cfcrfel{fc6wl}  |  4>U\yj{Nj),j  G  Ho  }>  (40) 

I  k=i,kjtj 


in  (37). 

Note  that  (40)  is  the  cost  for  K  —  1  components  (where 
one  and  only  one  component  is  abnormal)  starting  at  time 
t  =  t2  =  Nj  +  1,  with  prior  probability  Ttk(t2)  =  1ZkJ't^ 
for  component  k  ^  j  being  abnormal.  By  the  induction 
hypothesis,  for  any  optimal  selection  rule  0(j)  that  selects 
component  j  first,  arranging  the  last  K  —  1  components  in 
decreasing  order  of  nk{t2)ck/EA2 (Nk\Ho)  (and  testing  them 
by  the  SPRT)  minimizes  (40). 

Since 

TTk(t2)  =  TTfc(ti)  Vfc  ^  j,  (41) 

1  —  TTj(t  2) 

then 

7fl(i2)ci  Tr2{t2)c2  TTj-l(t2)Cj-l 

FjA2(Ni\Ho)  ~  EA2(N2\H0)-'"~  EA2(Nj_i\Ho) 

^  TTj  +  l(t2)Cj+l  7T x(t2)CK 

~  ~EA2(Nj+i \Hq)  ~  EA2(NK\H0)' 

(42) 

Thus,  the  last  A'— 1  components  must  be  selected  in  decreasing 
order  of  TTk(ti)ck/^A2(Nk\Ho)  and  tested  by  the  SPRT. 

Step  2.2:  Proving  the  theorem  for  all  the  K  components: 
Finally,  we  show  that  component  1  (i.e.,  the  component  with 
the  highest  index)  must  be  selected  first.  The  expected  cost 
achieved  by  (T'(t),S'(i),<p^)  is  given  by: 


E 


'  K 

E 

l 


CfcTfcl{fe6Wl}  |  (T'(f),<5'(t),0O)) 


K 

=  nj(ti)cj(E>(Nj\Hi,ti))+  ^  \^k  (f  1  ^ 

k=l,k^j 

U'iNfHo^+l  53  EA2  (Ni\H0) 

\  \*=1 


+EA2  (Nk\Hi))]  . 

(43) 


First,  note  that  the  expected  cost  achieved  by 
(t' (f) ,  S1  (t) ,  <f>^ )  can  be  further  reduced  for  all  j  by 
minimizing  the  expected  sample  size  E'{Nj \Ht,  A)  for 
i  =  0,1,  which  is  achieved  by  ( tA2,5a 2).  Therefore,  an 
optimal  solution  must  be  (tA2  ,  & A2 ,  0^)  for  an  optimal 
selection  rule  <p!j ' .  Thus,  in  the  following  we  consider 
solutions  of  the  form  (t‘42,  <T42,  </>). 

Next,  by  contradiction,  consider  an  optimal  selection  rule 
that  selects  component  j  1  first.  Therefore,  < j 
selects  the  components  in  the  following  order: 

3, 1)2,  l,j  +  1,  •••,  K. 

As  a  result,  the  expected  cost  achieved  by  (r*42,  <T42, 
is  given  by: 

E{5:^11^1}  I  {rA2,8A2,(f){3¥ll)) | 

=  *j(t i)cj  (E^iNfHi)) 

+Tri(ti)ci  [EA2  (Nj\H0)  +  EA2  (Ni\Hi)] 

K 

+  E  [7rfc(ii)cfex 

k=2,k^j 

E-(1V,|H0)+  53  EA2  (Ni\Ho)  J 

+EA2  (Nk\Hi))]  . 

(44) 

We  use  the  interchange  argument  to  prove  the  theorem. 
Consider  a  selection  rule  <pf  i>  that  selects  component  1  first 
followed  by  components  j,  2, 3,  j  —  1,  j  +  1, ...,  K.  Similar  to 
(44),  the  expected  cost  achieved  by  ( ta 2,  SA2 ,  <j>^)  is  given 
by: 


K 


E  <  53  cfcrfcl{fce'Hl}  I  0 tA2,8A2,cI> (1)) 


,fc=i 


=  ni(ti)ci  (EA2(Ni\Hi)) 

[EA2  (Ni\H0)  +  EA2  (Nj\Hi)] 


K 


+  E  i7rfe(ii)cfcx 

k—2,k^j 


k- 1 


EA2  ((Vj|iT0)  +  (  53  E"42  (JVj|iT0) 

A=MA/ 


+EA2  (TVfclHr));  . 


(45) 


By  comparing  (44)  and  (45),  it  can  be  verified  that: 


K 


.fc=i 


E  \  ^2  CkTkMken,}  I  0(1)) | 

<  E  {ECfcTfel(fce«i}  I  irA2  ^A2 


Kk= 1 
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since  7Ti(ii)ci/E-42(lVi|iT0)  >  ^j(h)cj/'EA2(Nj\H0)  . 

The  expected  cost  can  be  reduced  by  selecting  component 
1  first  followed  by  component  j,  which  contradicts  the  opti¬ 
mality  of  Hence,  at  time  l  \  selecting  component  1 

minimizes  the  expected  cost.  We  have  already  proved  that 
selecting  the  last  K  —  1  components  in  decreasing  order 
of  7Tfc(fi)cfc/E‘42(fVfc|Tfo)  minimizes  the  objective  function, 
which  completes  the  proof.  ■ 

B.  Proof  of  Theorem  1  Under  The  Independent  Model 

Let  E'(Nk\Hi,t)  be  the  expected  sample  size  achieved  by  a 
stopping  rule  and  a  decision  rule  {Tk(t),Sk(t)),  depending  on 
the  time  that  component  k  is  tested  (i.e.,  (rk(t),  5k(t))  depend 
on  the  selection  rule),  such  that  error  constraints  are  satisfied. 
Let  FjA1(Nk\Hi)  be  the  expected  sample  size  achieved  by  the 
SPRT’s  stopping  rule  and  decision  rule  (tA1,  SA1),  indepen¬ 
dent  of  the  time  that  component  k  is  tested  (i.e.,  (tA1, 5A1)  are 
independent  of  the  selection  rule),  such  that  error  constraints 
are  satisfied.  Clearly,  E'41  (N^H,)  <  E'  (Nk\Hi,t)  for  all  k,t, 
for  i  =  0, 1  and  are  achieved  by  Algorithm  1. 

First,  consider  the  case  where  K  =  2.  Assume  that 

7Ti(fi)ci  7r2(fi)c2 
EiA1(Ni)  ^  EA1(N2)  ' 

Consider  selection  rules  cty  \  that  select  component  1 
first  followed  by  component  2  and  component  2  first  followed 
by  component  1,  respectively.  The  expected  cost  achieved  by 
(T'(t),5'(t),0(2))  is  given  by: 

E  ixiCfeTfcl{fc6«  ii  i 

lfc=i 

=  (E/(JV2|-H‘1,t1))7r2(*1)C2 

+  (E'(N2\t1)  +  El(N1\H1,t2))n1(t1)c1. 

The  expected  cost  achieved  by  (r'(f),  S'(t),  4><'1'1)  is  given  by: 

E  (x]CfcTfcl{fc6«  ii  i  (r,w^'w>a)) 

lfc=i 

=  (E,(W1|JT1,f1))7r1(f1)ci 

+  (El(N1\t1)  +  El(N2\H1,t2))ir2(ti)c2. 

Note  that  the  expected  cost  achieved  by  both  selection  rules 
can  be  further  reduced  by  minimizing  the  expected  sample 
sizes  (such  that  error  constraints  are  satisfied)  independent 
of  the  selection  rules,  which  is  achieved  by  (rA1,  SA1). 
Therefore,  an  optimal  solution  must  be  (TA1,5A1,<f>^)  or 
(TA1,SA1,<j)^).  Next,  we  use  the  interchange  argument  to 
prove  the  theorem  for  K  =  2.  The  expected  cost  achieved  by 
(tA1  ,  SA1 ,  (j)^)  is  given  by: 

E  j^cfcTfcl {keHl}  I  (rA1,5A1,0(2)) 
lfc=i 

=  (E^(iV2|ifi))7r2(fi)c2 

+  (E A1(N2)  +  E Mh)ci. 


The  expected  cost  achieved  by  (ta1,Sa1,^>^)  is  given  by: 

E  llZCfcrfcl{fce«i}  I  (7"41-^A1-0(1)) 

lfc=i 

=  (E"41(fV1|fT1))  7r1(f1)c1 

+  (E A1(N!)  +  E"41(W2|fT1))  n2(h)c2. 

The  expected  cost  achieved  by  tjy-1’  is  lower  than  that  achieved 
by  since  ^ai^\  >  which  completes  the  proof 

for  K  =  2. 

The  rest  of  the  proof  follows  by  induction  on  the  number  of 
components,  as  was  done  under  the  exclusive  model.  ■ 


C.  Proof  of  Theorem  2 

For  every  k ,  let  E*(Nk\Hi)  be  the  minimal  expected 
sample  size  that  can  be  achieved  by  any  sequential  test,  such 
that  error  constraints  are  satisfied.  Let  EA3(Nk\Hf)  be  the 
expected  sample  size  achieved  by  Algorithm  3,  such  that  error 
constraints  are  satisfied.  Clearly,  E*  (Nk\H,)  <  EA3(Nk\Hi) 
for  all  k,  for  i  =  0, 1. 

Assume  that 

7T2(fl)c2  TtKf^CK 

E*(iVi)  -  E*(1V2)  E *{Nk)  ■  1  ’ 

Similar  to  the  proof  of  Theorem  1,  it  can  be  verified  that 
the  optimal  solution  to  (2)  is  to  select  the  components  in 
the  following  order:  1,2,..., AT,  where  the  components  are 
tested  by  a  sequential  test  that  achieves  expected  sample  size 
E*  ( Nk  |  H, )  for  all  k,  for  *  =  0,1.  Therefore,  the  expected 
cost  achieved  by  (r* ,  6* ,  <$>*)  is  given  by: 


K 


E  {  X!CfeTfcl{fe6«i}  I 


Kk= 1 
K 


=  '52*k{t1) 


Ck 


k= 1 


<k- 1 


(Ni)  +  E*  (JVfclJ?!) 


1 


(51) 


By  the  asymptotic  optimality  property  of  the  SALRT/SGLRT 
for  a  single  process  (used  in  Algorithm  3),  it  follows  that 
EA3(Nk\Hi)  ~  W(Nk\Hi)  for  all  k,  for  i  =  0, 1  as 
PkA  -A  0,  Pj^D  — >  0.  As  a  result,  for  sufficiently  small 
error  probabilities,  the  solution  (tA3 ,  dA3 ,  (j)A3)  is  to  select 
the  components  in  the  following  order:  1,2, ....  K,  where  the 
components  are  tested  by  an  asymptotically  optimal  sequential 
test  that  achieves  expected  sample  size  EA3(Nk\Hi)  for  all 
for  i  =  0,1.  Therefore,  the  expected  cost  achieved  by 
(tA3  ,  SA3 ,  <fA3)  is  given  by: 


A3 


dA3,(f)A3) 


K 


fc= 1 


fk-% 


J2^A3(Ni)  +^A3(Nk\H1) 


vi=l 


(52) 


Since  EA3(Nk\Hi)  ~  E*(Nk\Hi )  for  *  =  0, 1  as  P[A  -A 
0,  PkID  -A  0  for  all  k,  the  theorem  follows.  ■ 
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D.  Proof  of  Theorem  3 

The  structure  of  the  proof  is  similar  to  the  proof  of  Theorem 
2.  Hence,  we  provide  a  sketch  of  the  proof,  using  notation 
similar  to  that  used  in  the  proof  of  Theorem  2.  Similar  to 
the  proof  of  Theorem  1,  it  can  be  verified  that  the  optimal 
solution  to  (2)  is  to  select  the  components  in  decreasing 
order  of  nk(ti)ck /E* (Nk\H0),  where  the  components  are 
tested  by  a  sequential  test  that  achieves  expected  sample  size 
E*(7Vfc|7T,)  for  all  k,  for  i  =  0,1.  By  the  asymptotic  optimality 
property  for  a  single  process  of  the  SALRT/SGLRT  (used 
in  Algorithm  4),  it  follows  that  EA4(iVfc|fTi)  ~  E *(Nk\Hi) 
for  all  k,  for  i  =  0, 1  as  PjA  — >  0,  PPD  — >  0.  As  a 
result,  for  sufficiently  small  error  probabilities,  the  solution 
(t‘44,  <T44, 0'44)  is  to  select  the  components  in  decreasing 
order  of  7r/c(fi)c/c/E*(A^fc|fTo),  where  the  components  are  test¬ 
ed  by  an  asymptotically  optimal  sequential  test  that  achieves 
expected  sample  size  E,A4(Ni~\Hi)  for  all  k,  for  i  =  0,1. 
Similar  to  the  proof  of  Theorem  2,  comparing  the  objec¬ 
tive  functions  achieved  by  ( r*,6*,q !>*)  and  (r  44,  <5'44,  t^"44) 
proves  the  theorem.  ■ 
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